A systematic review and meta-analysis of fusion rate enhancements and bone graft options for spine surgery

Our study aimed to evaluate differences in outcomes of patients submitted to spinal fusion using different grafts measuring the effectiveness of spinal fusion rates, pseudarthrosis rates, and adverse events. Applying the Preferred Reporting Items for Systematic Reviews and Meta-Analyses statement, this systematic review and meta-analysis identified 64 eligible articles. The main inclusion criteria were adult patients that were submitted to spinal fusion, autologous iliac crest (AIC), allograft (ALG), alloplastic (ALP; hydroxyapatite, rhBMP-2, rhBMP-7, or the association between them), and local bone (LB), whether in addition to metallic implants or not, was applied. We made a comparison among those groups to evaluate the presence of differences in outcomes, such as fusion rate, hospital stay, follow-up extension (6, 12, 24, and 48 months), pseudarthrosis rate, and adverse events. Sixty-four studies were identified. LB presented significantly higher proportions of fusion rates (95.3% CI 89.7–98.7) compared to the AIC (88.6% CI 84.8–91.9), ALG (87.8% CI 80.8–93.4), and ALP (85.8% CI 75.7–93.5) study groups. Pseudarthrosis presented at a significantly lower pooled proportion of ALG studies (4.8% CI 0.1–15.7) compared to AIC (8.6% CI 4.2–14.2), ALP (7.1% CI 0.9–18.2), and LB (10.3% CI 1.8–24.5). ALP and AIC studies described significantly more cases of adverse events (80 events/404 patients and 860 events/2001 patients, respectively) compared to LB (20 events/311 patients) and ALG (73 events/459 patients). Most studies presented high risk-of-bias scores. Based on fusion rates and adverse events proportions, LB showed a superior trend among the graft cases we analyzed. However, our review revealed highly heterogeneous data and a need for more rigorous studies to better address and assist surgeons’ choices of the best spinal grafts.

Similarly, the market for spinal implants and devices was estimated at $7 billion in sales between 2013 and 2014 3 , reflecting an increase in material availability. However, the current literature insufficiently confirms the superiority of one intervention or graft 4,5 .
Like any other surgical intervention, spine fusions can lead to unexpected outcomes, such as pseudarthrosis or other adverse events. Pseudarthrosis can be defined as a solid fusion failure, whether symptomatic or asymptomatic, that can increase the risk of neurologic symptoms, material failure, and deformity 6,7 . To make appropriate decisions, surgeons must weigh the effectiveness versus costs of each graft type.
Autologous iliac crest (AIC) graft has been considered the gold standard treatment for spinal fusion because of its histocompatible and non-immunogenic properties, presenting higher amounts of cancellous bone, growth factors, and pluripotent cells related to osteoinduction, osteogenesis, and osteoconduction [8][9][10] . Unfortunately, spinal fusions with AIC have been associated with several morbidities, such as a higher incidence of infection, donor site pain, hematoma development, increased operative time, and blood loss [11][12][13][14][15][16] . www.nature.com/scientificreports/ As consequence of AIC drawbacks, alternative grafts have been developed, and these alternatives are increasingly diverse and available. Such materials can be classified as extender, enhancer, or substitute grafts 17,18 . An extender decreases the need for large amounts of autologous bone grafting (ABG) while offering the same bone formation properties as AIC 17,18 . An enhancer is a material combined with ABG to increase successful fusion rates compared to ABG alone 17,18 . A substitute replaces an ABG and presents the same or higher healing success rates compared to ABG alone 17,18 .
These materials are often assembled in various proportions to achieve spinal fusion 17 . However, allograft (ALG) and alloplastic (ALP) grafts are foreign bodies that carry some inherent risks. Considering their pros and cons, AIC use is favorable since AIC need not be associated with other grafts to achieve reliable results 19 . One frequently used alternative is local bone (LB), however, to our best knowledge, previous studies have only compared autologous bone graft with ALP and ALG. They failed to make a subdivision of autologous bone graft in LB and AIC. This is crucial since if it is possible to avoid using AIC, or other nonlocal bone, the post operatory morbidity, especially residual pain and an extra wound/scar, can be avoided.

Methods
This study was conducted according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement 20 . A comprehensive web-based literature search was conducted through to January 2021, using three databases (Lilacs, PubMed, and Cochrane), by two independent authors (SAF and ASMS), without publication-language restrictions. For all databases, controlled vocabulary and text word searches were performed, using a combination of the keywords: "spinal fusion AND autograft AND spinous process", "spinal fusion AND autograft AND spinal lamina", "spinal fusion AND autograft AND iliac crest", "spinal fusion AND heterograft", "spinal fusion AND allograft AND spinous process", "spinal fusion AND allograft AND spinal lamina", "spinal fusion AND allograft AND iliac crest". Our search was direct toward adult patients that were submitted to spinal fusion, which ALG, ALP, LB or AIC was applied. Due to an increase in medical devices availability nowadays, we compared these groups between them to evaluate the presence of differences or superiority in outcomes, such as fusion rate, hospital stay, follow-up extension (6,12,24, and 48 months), pseudarthrosis rate, and adverse events.
Titles, abstracts, and full-text studies were reviewed according to pre-established criteria, and then the relevant data were extracted. Discrepancies were resolved by consensus with the remainder of the research team. This study's inclusion and exclusion criteria are presented in Table 1. Retrospective studies, prospective analyses, randomized clinical trials, and case series were included in this review. The cut-off date for the review was January 31, 2021. Data extraction. The following data were abstracted from all included studies: study design, year, patient demographics, preoperative assessment, intraoperative information, postoperative assessment, hospital stay, followup extension, fusion rate, pseudarthrosis rate (comprising reported data for nonunion and pseudarthrosis), and adverse events (graft-related, infections, and neurological). Data were partially (one graft-type group of interest) or fully (all graft-type groups in the study) extracted from comparative studies, in accordance with our inclusion criteria. Two investigators (SAF and ASMS) independently performed a systematic review of all identified citations. No attempts were made to contact the authors of the reviewed studies to obtain missing or unreported data. Our main outcome of interest was fusion rates, and secondary outcomes included pseudarthrosis and adverse event rates.

Risk of bias assessments and evaluations of validity.
The quality of eligible studies and their risk of bias (RoB) were examined by two reviewers (SAF and ASMS) using the methodological index for non-randomized studies (MINORS) 21 , and the Cochrane's collaboration tool for assessing RoB 22 in randomized controlled trials. The high risk of bias for RoB score for non-randomized studies was determined to be ≤ 8 (controlled group not present) or ≤ 12 (controlled group present). For randomized controlled trials, each domain was classified as unclear bias, low RoB, or high RoB. Heterogeneity assessments. Heterogeneity between studies was examined using the I 2 statistic and the P-value for heterogeneity 23 . Substantial heterogeneity is defined as ≥ 50% 24 . weighted proportion and its 95% confidence interval (CI) for each outcome of interest. MedCalc uses a Freeman-Tukey transformation to calculate summary proportions, weighted according to the number of patients described in each study. We determined the pooled proportion using a random-effects model. Data were summarized in tables and further stratified based on bone graft types (AIC, ALG, ALP, [comprising hydroxyapatite, rhBMP-2, rhBMP-7, titanium cages], and LB). The Kruskal-Wallis test was used to compare variables among the four groups, and post hoc analyses using Mann-Whitney U tests were performed to compare two groups. When multiple follow-up periods were available for a study, data from the last assessment were used for the combined analyses. Subsequently, the fusion rates stratified by bone graft substitutes (bone graft alone or combined with metallic implants), and follow-up periods (6, 12, 24, and 48 months) were further analyzed (subgroup analysis). Studies that did not report the timing of fusion rates assessments were excluded from this subgroup analysis. Further analysis (meta-regression) to identify factors related to fusion rates (surgical approach, pseudarthrosis, and adverse events) were unsuccessful because the methods used to report the data were inconsistent across studies.

Results
Study demographics. As designated by the PRISMA guidelines 20 , Supplementary Fig. 1

Participant demographics.
Patients' and procedures' characteristics are summarized in Table 2. Overall, patients' main diagnoses for surgical intervention were degenerative diseases (78.8%). A thorough analysis of follow-up, procedure duration, blood loss, and hospital length of stays (LOS) was impaired due to a lack of systematic reports. Data were inconsistent across studies since none of the ALG articles specified hospital LOS. Similarly, some aspects had been exposed by a unique author, such as procedure time and blood loss in the ALG and ALP groups.
Pre-and post-operative assessments. Patient assessments were not reported systematically, making this study's analysis difficult. Apart from distinct assessments during patients' clinical courses, such as weight and height (in preoperative assessments) and Odom's criteria (in postoperative assessments), pre-and postoperative assessments included matching analysis only for Japanese Orthopedic Association Score (JOA) and Nurick Grade reports in the AIC group. The same pattern was observed in the LB group (Frankel scale report) and ALP group (Frankel and JOA reports), as Table 3 shows.      Supplementary Fig. 4. Detailed information about rates and confidence intervals are presented in Supplementary Table 4.

Meta-analysis of secondary outcomes.
Only 32 studies described rates of pseudarthrosis: 17 in the AIC group, six in the ALP group, four in the ALG group, and five in the LB group. Pseudarthrosis presented a pooled proportion of 14.2% CI 8.9-20.5%, I 2 = 74.2%, and P < 0.0001 for the lumbar spine region (88 of 625 patients) versus 4.1% CI 1.6-7.7%, I 2 = 76.6%, and P < 0.0001 for the cervical spine (29 of 776 patients). According to applied grafts, pseudarthrosis achieved a significantly lower pooled proportion in ALG studies (four events among 243 patients, 4.8% CI 0. Adverse events analysis was performed using three main categories: pain, infection, and graft-related events (graft collapse, fragmentation, protrusion, or breach). We also added donor site morbidity for the AIC sample. ALP and AIC studies described significantly more cases (80 events among 404 patients and 860 events among 2001 patients, respectively) than LB studies (20 events among 311 patients) and ALG studies (73 events among 459 patients). For our proportion analysis, we considered only events per patient. Table 4 displays our proportions analysis calculations based on the available data.

Discussion
Through our primary outcome analysis, our study showed a higher proportion of fusion rates for LB (95.3%) compared to AIC (88.6%), ALG (87.8%), and ALP (85.8%). This finding was not expected since LB has less trabecular bone, which would theoretically result in less bone marrow and less availability of the pluripotent cells and growth factors 25 . Also, LB's limited harvestable volume narrows its surgical recommendations, and it is commonly applied to the cervical spine (which involves a smaller area to cover and less body load to sustain compared to the lumbar spine).
Our sample mainly comprised AIC (2529) patients, followed by ALP (766), ALG (516), and LB (366) patients. This size discrepancy could explain the LB fusion effect among our pooled samples, which could exacerbate LB's effect. Moreover, most studies did not present participants baseline assessments, and since the fusion quality of www.nature.com/scientificreports/ distinctive grafts can diverge by age, metabolic activity, or graft-bed preparation 26,27 , confirming LB graft fusions superiority to the other studied options is challenging. Similarly, most of the reviewed studies did not follow the FDA's guidance for spinal fusion evaluations 28 , increasing their assessment bias. Additionally, the literature has often identified conflicting opinions regarding the optimal association between surgical techniques and patients' underlying predictive factors for spinal fusions and spinal grafts. Other metaanalyses, that have considered assorted graft materials or surgical approaches, have demonstrated higher fusion rates using rhBMP 27,29,30 or when grafts are associated with the anterior lumbar interbody fusion technique 31 . Moreover, minimally invasive procedures did not demonstrate fusion rate differences compared to open surgical techniques 32 .
Considering the data inconsistencies in our primary analysis, which precluded further associations (e.g., fusion rate × graft type × surgical technique), we performed a subgroup analysis of fusion rates with or without metallic implants. In this subgroup, LB presented lower fusion rates when associated with metallic implants, and this finding could be explained by LB limitations in graft volume availability 33 and/or small patient sample.
Pseudarthrosis rates and adverse events were studied as secondary outcomes. Our pseudarthrosis analysis revealed that the reported data presented a higher proportional rate of pseudarthrosis in the lumbar spine (14.2%) than the cervical spine (4.1%), consistently with previous analyses 6 , which was explained by the increased difficulty of stabilizing areas that support higher loads 34,35 . Furthermore, our analysis of bone graft types revealed that LB presented a higher pooled proportional pseudarthrosis rate (10.5%). However, some considerations are worth mentioning. Pseudarthrosis rates were not systematically assessed across the reviewed studies (AIC 17 of 51 analyses; ALG 4 of 9 analyses; ALP 6 of 20 analyses; and LB 5 of 10 analyses), which could have exacerbated the discrepancy between patient quantity and analyzed effects. Similarly, authors' descriptions of their results did not suggest that pseudarthrosis can be presumed to directly result from fusion rates' missing from fusion rate analyses. Moreover, the literature did not present a conclusive role governing bone grafts' influence on pseudarthrosis rates 6 .
Greater pseudarthrosis rates have already been associated with advanced age (because of delayed bridging maturation and increased bone resorption) 36 , degenerative disease, and construct length 6 . Longer fusions can enable loading distribution, minimizing excess motion and helping to decrease pseudarthrosis 34,37 . However, they can also increase points of load failure for each adjacent segment 34 , demand more grafts, and increase patients' exposure to complications (due to an extensive surgical intervention). Nevertheless, our literature review examined a limited sample for this subgroup analysis, and it included many studies with moderate to high heterogeneity, reflecting pseudarthrosis evaluations' diversity. For example, Choudhri et al. 38 recommend CT imaging with fine-cut axial and multiplanar reconstruction to evaluate spinal fusions. Nonetheless, no radiographic gold standard is available with which to evaluate pseudarthrosis 38 compared to open surgical exploration. Therefore, as in the literature, our review did not reveal a conclusive role governing bone grafts' influence on pseudarthrosis rates 6 .
Moreover, many available studies presented substantial methodological flaws regarding adverse events, limiting analyses. AIC pain corresponded to a 23.4% pooled proportional rate and a significant proportion of donor site morbidity (23.2%), corroborating the previously mentioned graft drawbacks already described in the literature [11][12][13][14][15][16] . Unsurprisingly, and as we have mentioned, foreign bodies can carry some inherent risks, which could explain ALP's higher pooled proportional rates of infection (10.2%) and graft-related events (35.1%).
Our study faced other limitations. Heterogeneity was found in different aspects of the reviewed studies' populations. This heterogeneity arose from clinical diversity in both treatment groups, supported by insufficient analyses, a small pool of subjects, differences on assessing patients' baseline and outcomes, and the absence of systematic reports (e.g., the use of tobacco or nonsteroidal anti-inflammatory drugs could have led to a misinterpretation of fusion rates). Moreover, a standard tool for data collection could improve data availability for fusion rate analysis and pseudarthrosis assessment. Furthermore, we did not include all available ALP grafts due to the high existent variability, which could wane proportional analysis. An example is the platelet rich plasma, which is gaining recognition as an important adjunct in the spinal graft market 39 . Finally, an overall higher RoB-which could influence appraisals of interventions effects-indicated a lack of structured randomized trials. Moreover, successful treatments should be interpreted in light of patients diminished exposure to nosocomial events, acceptable survival rates, and function after treatment.
Comparing the inputs of more than three decades of medical evolution is challenging, given technical improvements, instrumental variations, and a greater range of material. The competition for better outcomes versus materials will continue, as well the difficulty of medical updates and the discernment of industry interests. www.nature.com/scientificreports/ Structured clinical trials are highly encouraged to promote the availability of optimal, cost-benefit treatments for patients. The findings of our analysis demonstrate substantial variety of spinal grafts and the need for more rigorous studies to better address and assist surgeons in choosing the best graft options. Standardized methods to evaluate spinal fusion and pseudarthrosis are encouraged.